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Abstract 


For  Monte  Carlo  simulations  of  systems  of  size  M that  use  the  method 
of  Bortz,  Kalos,  and  Liebowitz  (BKL),  the  best  computer  time  per  event  has 
been  0(M^/^).  We  present  two  new  methods  whose  computer  time  per  event 
is  0(M^/^)  or  O(logM).  In  practice,  for  typical  simulation  sizes,  K = 4 or 
= 5 is  fastest,  requiring  even  less  computer  time  than  the  O(logM)  method. 
For  typical  simulation  sizes,  we  are  able  to  achieve  speedup  factors  of  5 to  7 

over  the  0(M^/^)  technique. 
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Monte  Carlo  (MC)  sirnnlations  use  a noticeable  fraction  of  the  available  time  on  many 
supercomputers  and  scientific  workstations.  This  paper  presents  algorithms  to  do  the 
same  simulations  in  substantially  less  computer  time.  The  algorithms  are  applicable  both 
to  equilibrium  [1,2]  and  kinetic  [3]  Monte  Carlo  simulations.  We  use  the  terminology  of 
kinetic  Monte  Carlo  to  describe  our  methods. 

Let  M be  the  number  of  possible  MC  events  (or  state' transitions),  and  let  be  the 
rate  at  which  the  event  should  occur.  The  total  rate  is 

M 

R = (1) 

1=1 

For  an  accurate  simulation,  event  i should  occur  with  probability  r^/R.  (For  eqmhbrium 
MC,  the  calculations  are  the  same,  but  the  {cj}  are  transition  probabilities,  not  rates.) 

The  inner  loop  is  as  follows. 

1.  Choose  a random  number,  p,  in  the  range  [0,i?). 

2.  Find  the  corresponding  u such  that 

(2) 

t=i  »=i 

3.  Carry  out  event  u. 

4.  Update  those  r,  that  have  changed  as  a result  of  event  u;  update  R and  any  data 
structures  being  used. 

Steps  1 and  3 take  time  independent  of  M,  but  step  2 is  time-consuming.  Depending 
on  the  data  structure  needed  for  the  search  scheme,  the  step  4 time  may  or  may  not  grow 
with  M . If  a simple  hnear  search  is  used  for  step  2,  the  search  time  is  0{M)  and  the  step 
4 time  is  independent  of  M. 

The  current  state  of  the  art  [4-6]  is  the  binning  method  due  to  Maksym  [7].  To  facihtate 
the  search,  maintain  a data  structure  of  partial  sums;  there  are  \M/g']  partial  sums,  each 
containing  the  sum  of  g rates, 

39 

Sj=  ^ j = 1,. ..,  [M/p].  (3) 


2 


The  search  in  step  2 has  two  parts:  searcliing  for  the  right  bin,  which  takes  time  0{AI/(j) 
and  searching  witfiin  the  bin,  wliich  takes  time  0{g)-  The  updating  in  step  4 is  independent 
of  M.  Minimizing  the  total  time  leads  to  g = a depends  on  the  relative  time  in 

the  two  searches.  In  our  simulations,  the  time  used  is  reasonably  insensitive  to  a in  the 
range  0.5  to  1.5,  and  cv  = 1 is  used  in  the  results  presented  later.  Clearly,  the  computer 
time  per  simulated  event  is 

We  refer  to  Maksym’s  method  as  a 2-level  search  scheme.  Better  asymptotic  behaMor 
can  be  obtained  by  constructing  A'-level  schemes  with  K > 2.  The  lowest  level  is  identical 
except  for  a slight  change  in  the  notation, 

39 

£ r.,  j = 1,...,  (4) 

and  higher  levels  are  defined  recursively, 

39 

5W  ^ i = 1, . . . , IM/g”-'] , = 3, . . . , A'.  (5) 

t=l  + (j-l)g 

(The  ^’s  can  depend  on  k and  should  depend  on  K,  but  for  simphcity  this  is  not  reflected 
in  the  notation.)  Minimizing  the  total  time,  we  find  g is  0{M^^^)  and  the  toted  search 
time  is  0{K The  time  for  step  4 is  0{K),  independent  of  M.  The  extra  space 
required  for  the  data  structure  is  + • • • + 

For  a given  M,  the  best  asymptotic  time  behavior  is  obtained  by  using  the  largest 
feasible  A",  namely  the  one  for  which  there  are  only  two  items  in  each  partial  sum,  — 

2,  or  K = logj  M.  The  data  structure  is  then  a binary  tree.  The  search  time  and  the  step 
4 time  are  each  O(log2  M).  The  extra  space  required  for  the  data  structure  is  0{M). 


The  binary  tree  may  be  built  as  follows  (for  simplicity  of  notation,  we  assume  M = 2^). 
Let  = rj  and  generate  the  rest  of  the  tree  recursively. 

tW  ^ ^ k = 2,...,  K.  (6) 

Note  that  T[^'^  = R. 

Searching  the  binary  tree  is  done  recursively,  starting  with  T[^\  Start  with  j = 1, 
k = K . Recursively,  if  p < T2^Zi\  the  desired  event  is  in  the  left  half;  set  j = 2j  — 1. 
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Otherw'ise,  subtract  from  p and  set  j = 2j.  If  = 1,  we  are  done;  otherwise  set 

k = k — 1 and  continue. 

Updating  the  binary  tree  after  an  event  is  also  done  recursively  by  propagating  the 
effects  of  a change  from  k = 1 to  k = K.  Both  the  searching  time  and  the  updating 
time  are  0{K)  = 0{\og2  M). 

To  compare  these  methods,  we  simulated  a sohd-on-s'ohd  model  of  epitaxial  growth 
[4,5,8]  on  a simple  cubic  N x N lattice.  Atoms  are  deposited  at  a steady  rate  and  can 
hop  around  on  the  surface,  but  do  not  evaporate  from  the  surface.  Further  details  of  the 
model  may  be  found  in  [5,8].  Of  interest  here  is  the  computer  time  for  the  simulation, 
expressed  in  microseconds  per  simulated  deposition  or  hop  event. 

For  the  /i -level  methods,  g = was  used.  Simulations  were  done  on  an  IBM 

RISC  6000  Model  590  workstation^  with  one  gigabyte  of  memory.  For  each  data  point, 
five  trials  with  different  seeds  of  the  random  number  generator  were  done  and  the  median 
time  was  used.  As  showmin  Fig.  1,  for  typical  simulation  sizes,  about  N = 512,  the  2-level 
scheme  takes  more  than  seven  times  as  long  as  the  5-level.  Our  runs  on  other  computers 
(Sihcon  Graphics  Indigo2  and  Convex  C3820)  give  qualitatively  similar  results,  with  the 
2-level  scheme  taking  about  five  times  as  long  as  the  5-level  dii  N — 512,  but  the  details  of 
the  curves  vary  considerably,  depending  on  the  compiler/ computer  combination  and  the 
presence  of  cache  memory. 

For  the  shortest  times,  K should  grow  as  logM,  keeping  the  bin  size,  constant. 
Based  on  our  experiments,  bin  sizes  of  three  to  five  are  reasonable.  Speedups  by  a factor 
of  five  or  more  for  typical  problem  sizes  are  obtained  over  the  2-level  method. 


^Certain  commercial  equipment  may  be  identified  in  order  to  adequately  specify  or  describe  the 
subject  matter  of  this  work.  In  no  case  does  such  identification  imply  recommendation  or  endorse- 
ment by  the  National  Institute  of  Standards  and  Technology,  not  does  it  imply  that  the  equipment 
identified  is  necessarily  the  best  available  for  the  purpose. 
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FIGURES 


FIG. 
A K = 


. Log-log  plot  of  time  per  event  for  different  search  schemes.  Q)  K = 2;  □ A” 
1;  V = 5;  O binary  tree. 
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